A Novel Semi-supervised Method for Named Entity Detection
ثبت نشده
چکیده
Machine Learning (ML) based approaches for the Named Entity Recognition (NER) task require Named Entity (NE) annotated data to train the classifier. If amount of NE annotated data is not sufficient, the classifier may not yield good result. NE annotated data is scarce, especially for resource poor languages. But in most cases large raw corpora are available. In this paper we describe a novel approach of making use of additional raw corpus to improve the performance of a Maximum Entropy (MaxEnt) based classifier. The proposed methodology is applied on the NER task in Hindi. Experimental results prove the effectiveness of the proposed approach, which might be helpful in Natural Language Processing (NLP) tasks in resource poor languages.
منابع مشابه
Semi-Supervised Text Mining For Dynamic Business Network Discovery
Recently, much research effort has been devoted to the discovery and analysis of online social networks. However, relatively little research has been done for business network discovery and analysis. Although named entity recognition (NER) tools are available to identify basic entities in texts, there are still challenging research problems, such as co-reference resolution and the identificatio...
متن کاملSemi-supervised Bio-named Entity Recognition with Word-Codebook Learning
We describe a novel semi-supervised method called WordCodebook Learning (WCL), and apply it to the task of bionamed entity recognition (bioNER). Typical bioNER systems can be seen as tasks of assigning labels to words in bioliterature text. To improve supervised tagging, WCL learns a class of word-level feature embeddings to capture word semantic meanings or word label patterns from a large unl...
متن کاملChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Named entity recognition (NER) plays an important role in the NLP literature. The traditional methods tend to employ large annotated corpus to achieve a high performance. Different with many semi-supervised learning models for NER task, in this paper, we employ the graph-based semi-supervised learning (GBSSL) method to utilize the freely available unlabeled data. The experiment shows that the u...
متن کاملLocating Complex Named Entities in Web Text
Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of predefined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or book titles) can be very difficult to pick out precisely from text. Further, the Web contains a ...
متن کاملSemi-supervised structured prediction models
Learning mappings between arbitrary structured input and output variables is a fundamental problem in machine learning. It covers many natural learning tasks and challenges the standard model of learning a mapping from independently drawn instances to a small set of labels. Potential applications include classification with a class taxonomy, named entity recognition, and natural language parsin...
متن کاملSemi-supervised Statistical Inference for Business Entities Extraction and Business Relations Discovery
The sheer volume of user-contributed data on the Internet has motivated organizations to explore the collective business intelligence (BI) for improving business decisions making. One common problem for BI extraction is to accurately identify the entities being referred to in user-contributed comments. Although named entity recognition (NER) tools are available to identify basic entities in tex...
متن کامل